
Identifying the Rich: Registration, Taxation, and Access to the State in Tanzania
Jeremy Bowles

Replication Readme

Software used: Stata/MP 17.0, R 4.1.2

Required packages (Stata): esttab, reghdfe, ivreghdfe, boottest.
Required packages (R): lfe, tidyverse, readxl, viridis, haven, sf, modelsummary.


To replicate all figures and plots in the paper, run files in this order:

(1) make_tables_census.do		: Stata file which generates all tables using Census data (Tables 1, 2, 4, A1, A4-9, A11) and outputs to "Tables" subfolder.
(2) make_tables_NPS.do			: Stata file which generates all tables using NPS data (Tables 3, 5, A2, A4e, A10) and outputs to "Tables" subfolder.
(3) make_figures.R			: R file which generates all figures in paper and supplementary materials (Figures 1, A1-A11) and outputs to "Figures" subfolder. Also generates Table 5 and outputs to "Tables" subfolder.

All tables and figures are named according to their position in the paper. Table A4 (First stage (robustness)) requires running both (1) and (2) sequentially since the bottom panel relies on NPS data.

All the tables and figures from the main paper and the online appendix are then embedded in "paper_and_appendix.tex". Rendering this .tex file to PDF will update all tables and figures in the main paper and online appendix supplementary materials. 

Additional results are embedded in "additional_results.tex". These include: 
(1) "Tables/With controls/": All tables from the main paper and appendix including control coefficients (which are omitted from the main paper for clarity).
(2) "Tables/Additional results/": Estimating all analytical results (i.e. Tables 1-5) while permuting the specification according to the robustness tests applied to Table A4 (First stage (robustness)). These are named according to the panels in Table A4: for example, "Tables/Additional results/Table 2_A1.tex" estimates Table 2 while restricting to +/- 5 cohorts (i.e. the first half of panel A in Table A4).

Other files in root folder:

Codebook.pdf			: Codebook for datasets.
main_and_appendix.tex		: All tables and figures in main paper/online appendix produced using replication data.
main_and_appendix.pdf		: All tables and figures in main paper/online appendix produced using replication data (output).
additional_results.tex		: All tables produced in supplementary analysis.
additional_results.pdf		: All tables produced in supplementary analysis (output).


Files in "Data" subfolder (see Codebook for information on particular variables):

Data/Cross-national/registration.csv: 			Cross-national data on income; registration; state capacity; and inequality (source: World Bank, 2018; UNICEF, 2017; Hanson & Sigman, 2021).
Data/Cross-national/crossnat_capacity.csv: 		Annual data on state capacity and exclusion from public resources (source: Hanson & Sigman, 2021; V-DEM 2021).Data/Cross-national/civil_registries.csv: 		Cumulative share of countries in sub-Saharan Africa with civil registries by year (source: World Bank, 2017).

Data/Descriptive/registration_orders_by_year.csv: 	Cumulative share of districts in Tanzania with compulsory registration orders by year (source: Tanzania National Archives).Data/Descriptive/legislation.csv: 			Incidence of particular topics in Tanzanian legislation between 1962-1970 (source: Southern African Legal Information Institute).

Data/GIS/districts.shp: 				Shapefile of 2012 Tanzanian district boundaries (source: Tanzanian national census, 2012).
Data/GIS/tz_town_locations.csv: 			Geolocations of towns/former towns (source: Tanzanian national census, 1967).

Data/District-level/taxation.dta: 			District-level socio-economic variables (source: Jensen & Mkama, 1968 and Lee, 1965).
Data/District-level/schools.dta: 			District-level school presence existing as of 1966 (source: Ministry of Education).
Data/District-level/treatment_map.dta: 			This file maps districts (as of 2012) to districts as existing during the reform period and maps them to (different versions of) treatment assignment.